Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Technology of Text Mining

Identifieur interne : 001A14 ( Main/Exploration ); précédent : 001A13; suivant : 001A15

Technology of Text Mining

Auteurs : Ari Visa [Finlande]

Source :

RBID : ISTEX:A9D55CDEED0425A739C61C52479F43C882308A8B

Abstract

Abstract: A large amount of information is stored in databases, in intranets or in Internet. This information is organised in documents or in text documents. The difference depends on the fact if pictures, tables, figures, and formulas are included or not. The common problem is to find the desired piece of information, a trend, or an undiscovered pattern from these sources. The problem is not a new one. Traditionally the problem has been considered under the title of information seeking, this means the science how to find a book in the library. Traditionally the problem has been solved either by classifying and accessing documents by Dewey Decimal Classification system or by giving a number of characteristic keywords. The problem is that nowadays there are lots of unclassified documents in company databases and in intranet or in Internet. First one defines some terms. Text filtering means an information seeking process in which documents are selected from a dynamic text stream. Text mining is a process of analysing text to extract information from it for particular purposes. Text categorisation means the process of clustering similar documents from a large document set. All these terms have a certain degree of overlapping. Text mining, also know as document information mining, text data mining, or knowledge discovery in textual databases is an merging technology for analysing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. Typical subproblems that have been solved are language identification, feature selection/extraction, clustering, natural language processing, summarisation, categorisation, search, indexing, and visualisation. These subproblems are discussed in detail and the most common approaches are given. Finally some examples of current uses of text mining are given and some potential application areas are mentioned.

Url:
DOI: 10.1007/3-540-44596-X_1


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Technology of Text Mining</title>
<author>
<name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:A9D55CDEED0425A739C61C52479F43C882308A8B</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1007/3-540-44596-X_1</idno>
<idno type="url">https://api.istex.fr/document/A9D55CDEED0425A739C61C52479F43C882308A8B/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000317</idno>
<idno type="wicri:Area/Istex/Curation">000312</idno>
<idno type="wicri:Area/Istex/Checkpoint">001068</idno>
<idno type="wicri:doubleKey">0302-9743:2001:Visa A:technology:of:text</idno>
<idno type="wicri:Area/Main/Merge">001B07</idno>
<idno type="wicri:Area/Main/Curation">001A14</idno>
<idno type="wicri:Area/Main/Exploration">001A14</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Technology of Text Mining</title>
<author>
<name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Finlande</country>
<wicri:regionArea>Tampere University of Technology, FIN-33101, P.O. Box 553, Tampere</wicri:regionArea>
<wicri:noRegion>Tampere</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Finlande</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2001</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">A9D55CDEED0425A739C61C52479F43C882308A8B</idno>
<idno type="DOI">10.1007/3-540-44596-X_1</idno>
<idno type="ChapterID">1</idno>
<idno type="ChapterID">Chap1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: A large amount of information is stored in databases, in intranets or in Internet. This information is organised in documents or in text documents. The difference depends on the fact if pictures, tables, figures, and formulas are included or not. The common problem is to find the desired piece of information, a trend, or an undiscovered pattern from these sources. The problem is not a new one. Traditionally the problem has been considered under the title of information seeking, this means the science how to find a book in the library. Traditionally the problem has been solved either by classifying and accessing documents by Dewey Decimal Classification system or by giving a number of characteristic keywords. The problem is that nowadays there are lots of unclassified documents in company databases and in intranet or in Internet. First one defines some terms. Text filtering means an information seeking process in which documents are selected from a dynamic text stream. Text mining is a process of analysing text to extract information from it for particular purposes. Text categorisation means the process of clustering similar documents from a large document set. All these terms have a certain degree of overlapping. Text mining, also know as document information mining, text data mining, or knowledge discovery in textual databases is an merging technology for analysing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. Typical subproblems that have been solved are language identification, feature selection/extraction, clustering, natural language processing, summarisation, categorisation, search, indexing, and visualisation. These subproblems are discussed in detail and the most common approaches are given. Finally some examples of current uses of text mining are given and some potential application areas are mentioned.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Finlande</li>
</country>
</list>
<tree>
<country name="Finlande">
<noRegion>
<name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</noRegion>
<name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A14 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001A14 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:A9D55CDEED0425A739C61C52479F43C882308A8B
   |texte=   Technology of Text Mining
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024